High-performance gene name normalization with GENO

نویسندگان

  • Joachim Wermter
  • Katrin Tomanek
  • Udo Hahn
چکیده

MOTIVATION The recognition and normalization of textual mentions of gene and protein names is both particularly important and challenging. Its importance lies in the fact that they constitute the crucial conceptual entities in biomedicine. Their recognition and normalization remains a challenging task because of widespread gene name ambiguities within species, across species, with common English words and with medical sublanguage terms. RESULTS We present GeNo, a highly competitive system for gene name normalization, which obtains an F-measure performance of 86.4% (precision: 87.8%, recall: 85.0%) on the BioCreAtIvE-II test set, thus being on a par with the best system on that task. Our system tackles the complex gene normalization problem by employing a carefully crafted suite of symbolic and statistical methods, and by fully relying on publicly available software and data resources, including extensive background knowledge based on semantic profiling. A major goal of our work is to present GeNo's architecture in a lucid and perspicuous way to pave the way to full reproducibility of our results. AVAILABILITY GeNo, including its underlying resources, will be available from www.julielab.de. It is also currently deployed in the Semedico search engine at www.semedico.org.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An enhanced CRF-based system for disease name entity recognition and normalization on BioCreative V DNER Task

Disease plays a central role in many areas of biomedical research and healthcare. However, the rapid growth of disease and treatment research creates barriers to the knowledge aggregation of PubMed database. Thus, a framework of disease mention recognition and normalization has become increasingly important for biomedical text mining. In this work, we utilize conditional random fields (CRFs) to...

متن کامل

Species taxonomy for gene name normalization

Background: The task of gene normalization is to assign a unique identifier from a database to the gene mentions. Using these identifiers a great deal of information can be gathered from external databases such as interactions, pathways, sequences and protein structures. Normalizing gene mentions in articles is a difficult task as the inter-species ambiguity of the gene mentions in biomedical p...

متن کامل

Identifying and determining protected zone, rehabilitation and special Use in Geno protected area and Investigating of intensity of sediment delivery

Because Geno protected area didn’t protect enough, it cause to use lots of part for domestic animal productivity and this ecosystem with special plant and animal species degrade. This study was down with this goal: determining of proper region for soil protection, rehabilitation and special use according dry region potentials. We use these maps: slope, aspect, elevation, soil, plant cover and r...

متن کامل

Classifying Gene Sentences in Biomedical Literature by Combining High-Precision Gene Identifiers

Gene name identification is a fundamental step to solve more complicated text mining problems such as gene normalization and protein-protein interactions. However, state-ofthe-art name identification methods are not yet sufficient for use in a fully automated system. In this regard, a relaxed task, gene/protein sentence identification, may serve more effectively for manually searching and brows...

متن کامل

Rapid Screening of MDR-TB in Cases of Extra Pulmonary Tuberculosis Using Geno Type MTBDRplus

BACKGROUND Drug resistance in tuberculosis is a major public health challenge in developing countries. The limited data available on drug resistance in extra pulmonary tuberculosis stimulated us to design our study on anti-tuberculosis drug resistance pattern in cases of extra pulmonary tuberculosis in a tertiary referral hospital of North India. We performed Geno Type MTBDRplus assay in compar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 25 6  شماره 

صفحات  -

تاریخ انتشار 2009